-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Scrub recent data #17759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Scrub recent data #17759
Conversation
4214a52 to
d4de3ca
Compare
d4de3ca to
30edf88
Compare
|
Last push rebasing to master to get fresh test results. |
30edf88 to
6517b25
Compare
|
I am not sure about this change motivation, and respectively adequate solution choice. If some device got lost, but it was redundant and did not cause pool suspension, then ZFS should already record TXGs when the device was offline and not properly updated to start resilver for that period when |
Sponsored-By: Wasabi Technology, Inc. Sponsored-By: Klara Inc. Signed-off-by: Mariusz Zaborski <[email protected]>
Sponsored-By: Wasabi Technology, Inc. Sponsored-By: Klara Inc. Signed-off-by: Mariusz Zaborski <[email protected]>
Recent data is defined as the last known point in the TXG database, minus a user-defined time interval (default: 4h). This feature can be triggered using either of the following commands: `zpool clean -s` or `zpool scrub -R`. Sponsored-By: Wasabi Technology, Inc. Sponsored-By: Klara Inc. Signed-off-by: Mariusz Zaborski <[email protected]>
6517b25 to
5859451
Compare
I think Mav has a point here that maybe for the |
Motivation and Context
When a disk goes offline and is later brought back online with
zpool clear, the user may also want to trigger a scrub of the recently written data to ensure there is no corruption.For user convenience, we provide a "recent" scrub option in
zpool scrub.Description
We use
scn_max_txgandscn_min_txg, which are already part of the scrub mechanism, to implement this feature.Since we (developers) don’t want to expose a configuration that specifies "the number of TXGs" (as discussed in #16301
), we instead use a time-based definition. Users can set the time using one of three units (seconds, hours, or days) for convenience. A TXG database is used to map these time ranges to valid TXG numbers.
Using time instead of TXG introduces an interesting problem: if the pool has been offline for days or weeks, and the user runs
zpool clear -s, it may do nothing, since no recent data has been written. To handle this case, we decided to take the last known TXG from the database and use that as the starting point for scrubbing.How Has This Been Tested?
New tests have been added to the this feature.
Types of changes
Checklist:
Signed-off-by.